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Abstract. We revisit a concept that has been central in some early 
stages of computer science, that of structured programming: a set of rules 
that an algorithm must follow in order to acquire a structure that is de- 
sirable in many aspects. While much has been written about structured 
programming, an important issue has been left unanswered: given an ar- 
bitrary, compiled program, describe an algorithm to decide whether or 
not it is structured, that is, whether it conforms to the stated principles of 
structured programming. We refer to the classical concept of structured 
programming, as described by Dijkstra. By employing a graph model 
and graph-theoretic techniques, we formulate an efficient algorithm for 
answering this question. To do so, we first introduce the class of graphs 
which correspond to structured programs, which we call Dijkstra Graphs. 
Our problem then becomes the recognition of such graphs, for which we 
present a greedy O(n)-time algorithm. Furthermore, we describe an iso- 
morphism algorithm for Dijkstra graphs, whose complexity is also linear 
in the number of vertices of the graph. Both the recognition and isomor- 
phism algorithms have potential important applications, such as in code 
similarity analysis. 


Keywords: graph algorithms, graph isomorphism, reducibility, struc- 
tured programming 


1 Introduction 


Structured programming was one of the main topics in computer science in 
the years around 1970. It can be viewed as a method for the development and 
description of algorithms and programs. Basically, it consists of a top-down for- 
mulation of the algorithm, breaking it into blocks or modules. The blocks are 
stepwise refined, possibly generating new, smaller blocks, until refinements no 
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longer exist. The technique constraints the description of the modules to contain 
only three basic control structures: sequence, selection and iteration. The first 
of them corresponds to sequential statements of the algorithm; the second refers 
to comparisons leading to different outcomes; the last one corresponds to sets of 
actions performed repeatedly in the algorithm. 


One of the early papers about structured programming was the article by 
Dijkstra “Go-to statement considered harmful” [8], which brought the idea that 
the unrestricted use of go-to statements is incompatible with well structured 
algorithms. That paper was soon followed by a discussion in the literature about 
go-to’s, as in the papers by Knuth and Floyd [18], Wulf [34] and Knuth [17]. 
Other classical papers are those by Dahl and Hoare [9], Hoare [16] and Wirth [28], 
among others. Guidelines of structured programming were established in an arti- 
cle by Dijkstra [10]. The early development of programming languages containing 
blocks, such as ALGOL (Wirth [29]) and PASCAL (Naur [23]), was an impor- 
tant reason for structured programming’s widespread adoption. This concept has 
been then further developed in papers by Kosaroju [20], describing the idea of 
reducibility among flowcharts. Moreover, [20] has introduced and characterized 
the class of D-charts, which in fact are graphs properly containing all those which 
originate from structured programming. Williams [32] also describes variations 
of different forms of structuredness, including the basic definitions by Dijkstra, 
as well as D-charts. The different forms of unstructuredness were described in 
papers by Williams [31] and McCabe [22]. The conversion of a unstructured flow 
diagram into a structured one has been considered by Williams and Ossher [33], 
and Oulsnam [24]. Formal aspects of structured programming include the papers 
by Bohm and Jacopini [4], Harel [12], and Kozen and Tseng [21]. A mathemati- 
cal theory for modeling structuredness, designed for flow graphs, in general, has 
been described by Fenton, Whitty and Kaposi [11]. The actual influence of the 
concept of structured programming in the development of algorithms for solving 
various problems in different areas occurred right from the start, either explicitly, 
as in the papers by Henderson and Snow [15], and Knuth and Szwarcfiter [19], 
or implicitly as in the various graph algorithms by Tarjan, e.g. [25,26]. 


A natural question regarding structured programming is to recognize whether 
a given program is structured. To our knowledge, such a question has not been 
solved neither in the early stages of structured programming, nor later. That is 
the main purpose of the present paper. We formulate an algorithm for recogniz- 
ing whether a given program is structured, according to Dijkstra’s concept of 
structured programming. Note that the input comprises the binary code, not the 
source code. A well-known representation that comes in handy is that of the con- 
trol graph (CFG) of a program, employed by the majority of reverse-engineering 
tools to perform data-flow analysis and optimizations. A CFG represents the 
intraprocedural computation of a function by depicting the existing links across 
its basic blocks. Each basic block represents a straight line in the program’s in- 
structions, ending (possibly) with a branch. An edge A > B (from the exit of 
block A to the start of block B) represents the program flowing from A to B at 
runtime. 
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We are then interested in the version of the recognition problem which takes 
as input a control flow graph of the program [1,5]: a directed graph represent- 
ing the possible sequences of basic blocks along the execution of the program. 
Our problem thus becomes graph-theoretic: given a control flow graph, decide 
whether it has been produced by a structured program. We apply a reducibil- 
ity method, whose reduction operations iteratively obtain smaller and smaller 
control flow graphs. 

In this paper, we first define the class of graphs which correspond to struc- 
tured programs, as considered by Dijkstra in [10]. Such a class has then been 
named as Dijkstra graphs. We describe a characterization that leads to a greedy 
O(n) time recognition algorithm for a Dijkstra graph with n vertices. Among the 
potential direct applications of the proposed algorithm, we can mention software 
watermarking via control flow graph modifications [3,6]. 

Additionally, we formulate an isomorphism algorithm for the class of Dijk- 
stra graphs. The method consists of defining a convenient code for a graph of the 
class, which consists of a string of integers. Such a code uniquely identifies the 
graph, and it is shown that two Dijkstra graphs are isomorphic if and only if their 
codes are the same. The code itself has size O(n) and the time complexity of the 
isomorphism algorithm is also O(n). In case the given graphs are isomorphic, the 
algorithm exhibits the isomorphism function between the graphs. Applications 
of isomorphism include code similarity analysis [7], since the method can deter- 
mine whether apparently distinct control flow graphs (of structured programs) 
are actually structurally identical, with potential implications in digital rights 
management. 


2 Preliminaries 


In this paper, all graphs are finite and directed. For a graph G, we denote 
its vertex and edge sets by V(G) and E(G), respectively, with |V(G)| = n, 
|E(G)| =m. For v,w € V(G), an edge from v to w is written as vw. We say vw 
is an out-edge of v and an in-edge of w, with w an out-neighbor of v, and v an 
in-neighbor of w. We denote by N¢(v) and NG (v) the sets of out-neighbors and 
in-neighbors of v, respectively. We may drop the subscript when the graph is clear 
from the context. Also, we write N?*(v) meaning N+(N*(v)). For v,w € V(G), 
v reaches w when there is a path in G from v to w. A source of G is a vertex 
that reaches all other vertices in G, while a sink is one which reaches no vertex, 
except itself. Denote by s(G) and t(G), respectively, a source and a sink of G. 
A (control) flow graph G is one which contains a distinguished source s(G). A 
source-sink graph contains both a distinguished source s(G) and distinguished 
sink ¢(G). A trivial graph contains a single vertex. 

A graph with no directed cycles is called acyclic. In an acyclic graph if there 
is a path from vertex v to vertex w, then v is an ancestor of w, and the latter a 
descendant of v. Additionally, if v,w are distinct then v is a it proper ancestor, 
and w a proper descendant. Let G be a flow graph with source s(G), and C 
a cycle of G. The cycle C’ is called a single-entry cycle if it contains a vertex 
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uv € C that separates s(G) from the vertices of C \ {v}. A flow graph in which 
each of its cycles is a single-entry cycle is called reducible. Reducible graphs were 
characterized by Hecht and Ullman [13,14]. An efficient recognition algorithm 
for this class has been described by Tarjan [27]. 

In a depth-first search (DFS) of a directed graph, in each step a vertex is 
inserted in a stack, or removed from it. Every vertex is inserted and removed 
from the stack exactly once. An edge vw € E(G), such that v is inserted in the 
stack after w, and before the removal of w, is called a cycle edge. Let C be the 
set of cycle edges of a graph, relative to some DFS. Clearly, the graph G—C 
is acyclic. The following characterization if reducible flow graphs is relevant for 
our purposes. 


Theorem 1 /14,27] A flow graph G is reducible if and only if, for any depth-first 
search of G starting from s(G), the set of cycle edges is invariant. 


In a flow graph graph G, we may write DFS of G, as to mean a DFS of G 
staring from s(G). In addition, if G is also reducible, based of the above theorem, 
we may use the terms ancestor or descendant of G, as to mean ancestor or 
descendant of G — C, where C is the (unique) set of cycle edges of G. 

A topological sort of a graph G is a sequence v1,...,Un of its vertices, such 
that vjv; € E(G) implies i < j. It is well known that G admits a topological 
sort if and only if G is acyclic. Finally, two graphs G,, G2 are isomorphic when 
there is a one-to-one correspondence f : V(G) = V(G2) such that vw € E(G;) 
if and only if f(v)f(w) € E(G2). In this case, write G; = Go, and call f an 
isomorphism function between G1, G2, with f(v) being the image of v under f. 


3. The Graphs of Structured Programming 


In this section, we describe the graphs of structured programming, as estab- 
lished by Dijkstra in [10], leading to the definition of Dijkstra graphs. First, we 
introduce a family of graphs directly related to Dijkstra’s concepts of structured 
programming. 

A statement graph is defined as being one of the following: 


) trivial graph 

) sequence graph 

) if graph 

) if-then-else graph 
) p-case graph, p > 3 
) while graph 
) repeat graph 


ae 
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For our purposes, it is convenient to assign labels to the vertices of statement 
graphs as follows. Each vertex is either an expansible vertex, labeled X, or a 
regular vertex, labelled R. See Figures 1 and 2, where the statement graphs are 
depicted with the corresponding vertex labels. All statement graphs are source- 
sink. Vertex v denotes the source of the graph in each case. 
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Fig. 1: Statement graphs (a)-(d) 
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Fig. 2: Statement graphs (e)-(g) 


Let G be an unlabeled reducible graph, and H a subgraph of G, having source 
s(H) and sink t(H). We say H is closed when 


~ ve V(H)\s(H) 3 N-(v) CV(H): 
~ ve V(A)\t(H) > N*+(v) C V(H); and 
— us(H) is a cycle edge > v € Nt(s(H)). 


In this case, s(#7) is the only vertex of H having possible in-neighbors outside 
H, and t(H) the only one possibly having out-neighbors outside H. 

The following concepts are central to our purposes. 

Let H be an induced subgraph of G. We say H is prime when 


— H is isomorphic to some non-trivial statement graph, and 
— H is closed. 


It should be noted that the while and repeat graphs, respectively, (f) and (g) 
of Figure 2, are not isomorphic in the context of flow reducible graphs. In fact, 
the cycle edge turns them distinguishable. The sources of such graphs are the 
entry vertices of the cycle edge, respectively. Then the sink is an out-neighbor 
of the source in (f), but not in (g). 

Next, let G, H be two graphs, V(G) N V(H) = 9, H source-sink, v € V(G). 

The expansion of v into a source-sink graph H (Figure 3) consists of replacing 
v by H, in G, such that 


~ Ng (s8(H)) = Ne (e); 
— N&(t(H)) = Né(v); and 
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Fig. 3: Expansion operation 


— the remaining adjacencies are unchanged. 


Now let G be a graph, and H a prime subgraph of G. The contraction of H 
into a single vertex (Figure 4) is the operation defined by the following steps: 


1. Identify (coalesce) the vertices of H into the source s(H) of H. 
2. Remove all parallel edges and loops. 


G Os) 


’ 





Fig. 4: Contraction operation 


We finally have the elements to define the class of Dijkstra graphs. The 


concepts of structured programming and top-down refinement [10] lead naturally 
to the following definition. 


A Dijkstra graph (DG) has vertices labeled X or R recursively defined as: 


1. A trivial statement graph is a DG. 

2. Any graph obtained from a DG by expanding some X-vertex into a non- 
trivial statement graph is also a DG. Furthermore, after expanding an X- 
labeled vertex v into a statement graph H, vertex s(H) is labeled as R. 


An example is given in Figure 5. 
The above definition leads directly to a method for constructing Dijkstra 
graphs, as follows. Find a sequence of graphs Go,...,Gx, such that 
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Fig. 5: Obtaining a Dijkstra graph via vertex expansions 


— Go is the trivial graph, with the vertex labeled X; 
— G; is obtained from G;_1, 1 > 1, by expanding some X-vertex v of it into a 
statement graph H. 


The above construction does not imply a polynomial-time algorithm for rec- 
ognizing graphs of the class. In the next section, we describe another character- 
ization which leads to such an algorithm. It is relevant to emphasize that the 
labels are used merely for constructing the graphs. For the actual recognition 
process, we are interested in the problem of deciding whether a given unlabeled 
flow graph is actually a Dijkstra graph. 


4 Recognition of Dijkstra Graphs 


In this section, we describe an algorithm for recognizing Dijkstra graphs. For the 
recognition process, the hypothesis is that we are given an arbitrary flow graph 
G, with no labels, and the aim is to decide whether or not G is a DG. First, we 
introduce some notation and describe the propositions which form the basis of 
the algorithm. 


4.1 Basic Lemmas 
The following lemma states some basic properties of Dijkstra graphs. 
Lemma 2 I[f G is a Dijkstra graph, then 


(i) G contains some prime subgraph; 
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(ii) G is a source-sink graph; and 
(iti) G is reducible. 


Proof. By definition, there is a sequence of graphs Go,...,G,, where Go is 
trivial, G, = G and G; is obtained from G;_1 by expanding some X-vertex 
ui-1 © V(Gj_1) into a statement graph H; C G;. Then no vertex v; € V(Aj), 
except s(H;) has in-neighbors outside H;, and also no vertex vu; € V(H;), except 
t(H;), has out-neighbors outside H;. Furthermore, if H; contains any cycle then 
HZ, is necessarily a while graph or a repeat graph. The latter implies that such a 
cycle is s(H)v, where v € N*(s(H)). Therefore H; is prime in G; meaning that 
(i) holds. To show (iz) and (izz), first observe that any statement graph is single- 
source and reducible. Next, apply induction. For Go, there is nothing to prove. 
Assume it holds for G;, i > 1. Let u;5-1 € V(Gi_1) be the vertex that expanded 
into the subgraph H; C G;. Then the external neighborhoods of H; coincide 
with the neighborhoods of v;_-1, respectively. Consequently, G; is single-source. 
Now, let C; be any cycle of Gj, if existing. If C; 7 H; = 0 then C; is single-entry, 
since G;_1 is reducible. Otherwise, if C; C V(H;) the same is valid, since any 
statement graph is reducible. Finally, if C; ¢ V(H;), then vj-1 is contained in 
a single-entry cycle Cj_; of G;_,. Then C; has been formed by C;_1, replacing 
vi-1 by a path contained in H;. Since Cj_ 1 is single-entry, it follows that C; 
must be so. 


Denote by H(G) the set of non-trivial prime graphs of G. Let H, H’ € H(G). 
Call H, H’ independent when 


— V(H)NV(#’) = 9, or 
— V(H)NV(#") = {v}, where v = s(H) =t(H’) or v = t(H) = 5(A’). 


The following lemma assures that any pair of distinct, non-trivial prime sub- 
graphs of a graph consists of independent subgraphs. 


Lemma 8 Let H,H' €H. It holds that H, H’ are independent. 


Proof. If V(H)NV(H’) = @ the lemma holds. Otherwise, let v € V(H)NV(H’). 
The alternatives v = s(H,) = s(H2),v =t(M) = t(H2),v 4 s(A1), t(Mi) orv F 
8(H2),t(H2) do not occur because they imply H or H2 not to be closed. Next, let 
v1, 02 € V(A1)NV (A), v1 F v2. In this situation, examine the alternative where 
v, = 8(H)) = t(H2) and v = s(H2) = t(H1). The latter implies that exactly one 
of H, or H2, say Ho, is a while graph or a repeat graph. Then there is a cycle 
edge ws(H}), satisfying w € N~ (s(H1)) and w € V(H2)\ {t(H2)}. Consequently, 
w ¢ N*(s(H;)), contradicting H, to be closed. The only remaining alternative 
is V(A1) OV (Aa) = {v}, with v = s(A1) = t(A2) or v = s(A2) = t( 1). Then 
HH, Hz are indeed independent (see Figure 6). 


Next, we introduce a concepts which central for the characterization. 

Let G be a graph, H(G) the set of non-trivial prime subgraphs of G, and 
H € H(G). Denote by G | H the graph obtained from G by contracting H. For 
v € V(G), the image of v in G | H, denoted Ig, 7(v), is 
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s(H) 
s(H) s(H’) 
or (H) = s(H') 
«(H) «(H') 
((H’) 


Fig. 6: Independent primes 


_ Vv, if U ¢ V(H) 
Teyn(v) = { s(H), otherwise. 


For V’ C V(G), define the (subset) image of V’ in G | H, as Igy (V’) = 
Uvev/Ie,H(v). Similarly, for H’ C G, the (subgraph) image of H' in G | H, 
denoted by I¢,#(H’), is the subgraph induced in G | H by the subset of vertices 
Toyu(V(H’')). 

The following lemmas are employed in the ensuing characterization. The 
first shows that any prime subgraph H € G is preserved under contractions of 
different primes. Let G be an arbitrary flow graph, H, H’ ¢ H(G), H #4 H’. 


Lemma 4 Ig,4(H’) © H(G| A). 


Proof. Let G be a graph, H, H’ € H(G), H #4 H’. By Lemma 3, H, H’ are inde- 
pendent. If H,H’ are disjoint the contraction of H does not affect H’, and 
the lemma holds. Otherwise, by the independence condition, it follows that 
V(H)NV(H') = {v}, where v = s(H) = t(H’) or v = s(H’) = t(H). Ex- 
amine the first of these alternatives. By contracting H, all neighborhoods of the 
vertices of Ig, (H’) remain unchanged, except that of Ig,#(s(H’)), since its 
in-neighborhood becomes equal to NG(s(H)). On the other hand, the contrac- 
tion of H into v cannot introduce new cycles in H’. Consequently, H’ preserves 
in G | A its property of being a non-trivial and closed statement graph, more- 
over, prime. Finally, suppose v = s(H) = t(H’). Again, the neighborhoods of 
the vertices of Igj#(H') are preserved, except possibly the out-neighborhoods 
of the vertices of Igy (t(H")), which become Nd (t(H)), after possibly removing 
self-loops. Consequently, Icj4(H’) € H(G | A). 


Next we prove prove a commutative law for the order of contractions. 


Lemma 5 If H,H’ € H(G), then (G | H) | Ucya(A’)) = (G1 A) L 
ayn (H)). 


Proof. Let A (G | H) | (loyn(H’)) and B ™ (G | A) 4 (loy(H)). 
By Lemma 3, H,H’ are independent. First, suppose H, H’ are disjoint. Then 
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Igjn(A"') = A’ and Ig, nH (H) = H. It follows that, in both graphs A and B, the 
subgraphs H and H’ are respectively replaced by a pair of non-adjacent vertices, 
whose in-neighborhoods are NG (s(H)) and NG (s(H')), and out-neighborhoods 
N&(t(H)) and Ng (t(H’)), respectively. Then A = B. In the second alternatives, 
suppose H, H’ are not disjoint. Then V(H) NV (H’) = {v}, where v = s(H) = 
t(H’), or v = t(H) = s(H"). In both cases, and in both graphs A and B, the 
subgraphs H and H’ are contracted into a common vertex w. When v = s(H = 
t(H"), it follows NG(A) = Ng(s(H’)) = Ng(v) and NZ(v) = Nd(t wm) = 
Nz (v). Finally, ica v=t(H = s(H"), we have Nj (v) = Ng (s s(H)) = Nz (v), 
while N{(v) = N&(t(H’)) = NZ#(v). Consequently, A = B in any situation. 














4.2 Contractile Sequences 
A sequence of graphs Go,...,G» is a contractile sequence for a graph G, when 


— G&Gb, and 
— Gist & (Gi | Ai), for some H; € H(G;), i < k. Call H; the contracting 
prime of G;. 


We say Go,...,Gz is maximal when H(G;) = 9. In particular, if G, is the 
trivial graph then Go,...,G , is maximal. 


Let Go,...,Gx, be a contractile sequence of G’, and H; the contracting prime 
of Gj. That is, Gj41 = (G; | Hj),0 <j < k. For Hj C Gj and q > j, the iterated 
image of Hj in Gy is recursively defined as 


Hi, ifq=j 
ry 
Ig, (Hj) = ce (Ia;4;(H4)), otherwise. 


Finally, we describe the characterization in which the recognition algorithm 
for Dijkstra graphs is based. 


Theorem 6 Let G be an arbitrary flow graph, with Go,...,Ge and Go,...,Gh 
two contractile sequences of G. Then Gy = Gi. Buronnore =k 


Proof. Let Go,...,Gz and Go,...,G, be two contractile sequences, denoted 
respectively by S and S$" ofa graph G. Let H; and Hj be the contracting primes of 
G; and G/,, respectively. That is, Gj41 = (G, | H;) or Gi = (Gil Hj),j<k 
and j < k’. Without loss of generality, assume k <k’. Let i be the lease index, 
such that Gj; = G‘, 7 < i. Such an index exists since G = Go = Go. Ifi = k then 
G;, = G),,, implying k = k’ and the theorem holds. Otherwise, i < k, G; = G and 
G; FG}. Since G; = G', it follows H; € H(G‘). By Lemma 4, the iterated image 
H;,, of H; in G4, is preserved as a prime subgraph for all Gj, as long as it does 
not become the contracting prime of Gi 1. Since Gi, has no prime subgraph, it 
follows there exists some index p, i < p < k’, such that G4, = (Gp | Hi, ), where 
H;,, represents the erated image of H; in Gt. Let Hj,_, be the iterated i image of 
Hi; “in Gi_1- Clearly, Hi_1, Hip, € HG), and by. Lenin 3, Hy_, and Hi,_, 
are indenendent | i Gj. ince (Gy Eg) A a) = Ga, by Lemma 4, it 
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follows that ((G,_1 | Hi,_,) 4 Hf_1) = G41, where H7_, represents the image 
of H),_, in G,_; | Hi,_,. Consequently, we have exchanged the positions in S$’ of 
two contracting primes, respectively at indices p— 1 and p, while preserving all 
graphs GE, for gq < p—land q > p. In particular, preserving the graph Gia and 
all graphs lying after G),,, in S’, together with their corresponding contracting 
primes. 

Finally, apply the above operation iteratively, until eventually the iterated 
image of H; becomes the contracting prime of G‘. In the latter situation, the 
two sequences coincide up to index 7 + 1, while preserving the original graphs 
G;, and G4,,. Again, applying iteratively such an argument, we eventually obtain 
that the two sequences turned coincident, preserving the original graphs G;, and 
G',,. Consequently, G;, = Gi, and k =k’. 


4.3. The Recognition Algorithm 


We start with a bound for the number m of edges of Dijkstra graphs. 
Lemma 7 Let G be a DG graph. Then m < 2n— 2. 


Proof. : If G is a DG graph there is a sequence of graphs Go,...Gx, where Go 
is the trivial graph, G, = G and G; is obtained from G;_, by expanding an 
X-vertex of G;_1 into a statement graph. Apply induction on the number of 
expansions employed in the construction of G. If k = 0 then G is a trivial graph, 
which satisfies the lemma. For k > 0, Suppose the lemma true for any graph 
G' = G;, 1 < k. In particular, let G; = Gz_1. Let n’ and m’ be the number of 
vertices and edges of G’, respectively. Then m’ < 2n’ — 2. We know that G, has 
been obtained by expanding a vertex of G,_1 into a statement graph H. Discuss 
the alternatives for H. If H is the trivial graph then n = n’ and m = m’. If H is 
a sequence graph then n = n’+1 and m =m’ +1. If A is an if graph, a while 
graph or repeat graph then n = n’ + 2 and m = m’ + 3. If A is an if then else 
graph or a p-case graph then n = n’+p+1 and m =m’ + 2p, where p is the 
outdegree of the source of H. In any of these alternatives, a simple calculation 
implies m < 2n — 2. 


We can describe an algorithm for recognizing Dijkstra graphs based on The- 
orem 6. We recall that the input is a unlabeled flow graph with no labels. Fur- 
thermore, for a while, assume that G is reducible, otherwise by Lemma 2 it is 
surely not a Dijkstra graph. 

Let G be a flow reducible graph. To apply Theorem 6, we construct a con- 
tractile sequence Go,...,G, of G. That is, find iteratively a non-trivial prime 
subgraph H; of the G; and contract it, until either the graph becomes trivial or 
otherwise no such subgraph exists. In the first case the graph is a DG, while in 
the second it is not. Recall from Lemma 4 that whenever G; contains another 
prime H; # H; then the iterated image of H; is preserved, as long as it does 
not become the contracting prime in some later iteration. On the other hand, 
the contraction G; | H; may generate a new prime H/, as shown in Figure ??. 
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Fig. 7: Generating a new prime H’ 


However, the generation of new primes obeys a rule, described by the lemma 
below. 


Lemma 8 Let G be reducible graph, H € H(G), H’ © H(G | H)\ H(G). Then 
s(H) is a proper descendant of s(H') inG | H. 


The above lemma suggests us to consider special contractile sequences, as 
below. 

Let G be a reducible graph, Go,...,Gz a contractile sequence C of G, H; 
the contracting prime of G;, 0 < i < k. Say that C is a bottom-up (contractile) 
sequence of G when each contracting prime H; satisfies: s(H;) is not a descendant 
of s(H), for any prime H # H; of G;. 

The idea of the recognition algorithm then becomes as follows. Let G be a 
reducible graph. Iteratively, find a lowest vertex v of G, s.t. v is the source of a 
prime subgraph H of G. Then contract H. Stop when noprimes exist any more. 

A complete description of the algorithm is below detailed. The algorithm 
answers YES or NO, according to respectively G is a Dijkstra graph or not. 

The correctness of Algorithm 1 follows basically from Theorem 6 and Lemma 8. 
However, the latter relies on the fact that G is a reducible graph, whereas the 
proposed algorithm considers as input an arbitrary graph. The lemma below 
justifies that can we avoid the step of recognizing reducible graphs. 


Lemma 9 Let G be an arbitrary flow graph input to Algorithm 1. If G is not a 
reducible graph then the algorithm would correctly answer NO. 


Proof. If G is not a reducible graph let Ec be the set of cycle edges, relative to 
some DFS startingate s(G). Then G contains some cycle C’, such that w does 
not separate s(G) from v, where vw € Ec is the cycle edge of C. Without loss 
of generality, consider the inner most of these cycles. The only way in which the 
edge vw, or any of its possible images, can be contracted is in context the of a 
while or repeat prime subgraph H, in which the cycle would be contracted into 
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Algorithm 1: Dijkstra graphs recognition algorithm 
G, arbitrary flow graph (no labels) 
Count the number m of edges of G. If m > 2n — 1 then return NO 
Ec, set of cycle edges of a DFS of G, starting at s(G) 
V1,-.--;Un, topological sorting of G — Ec 
w= Nn 


while i > 1 do 
if G is the trivial graph 
then return YES, stop 
if v; is the source of a prime subgraph H of G 
then G:=G|H 
i:=i-1 
return NO 





vertex w, or a possible iterated image of it. However there is no possibility for H 
to be identified as such, because the edge entering the cycle from outside prevents 
the subgraph to be closed. Consequently, the algorithm necessarily would answer 
NO. 


As for the complexity, first observe that to decide whether the graph contains 
a non-trivial prime subgraph whose source is a given vertex v € V(G), we need 
O|(Nt(v)| steps. Therefore, when considering all vertices of G we require O(m) 
time. There can be O(n) prime subgraphs altogether, and each time some prime 
H is identified, it is contracted, and the size of the graph decreases by |E(H)|. 
The number of steps required to contract a H is O|E(H)|. Hence each edge is 
examined at most a constant number of times during the entire process. Finding 
a topological sorting of a graph can be done in O(m). Thus, the time complexity 
is O(m), that is, O(n), by Lemma 7. 


5 Isomorphism of Dijkstra Graphs 


In this section, we describe a linear time algorithm for the isomorphism of Dijk- 
stra graphs. 

Given a Dijkstra graph G, the general idea consists of defining a code C(G) for 
G, having the following property. For any two Dijkstra graphs G1,G2, Gi = Ge 
if and only if C(G1) = C(G2). 

As in the recognition algorithm, the codes are obtained by constructing a 
bottom-up contractile sequence of each graph. The codes refer explicitly to the 
statement graphs having source v as depicted in Figures 1 and 2, and consist 
of (linear) strings. For a Dijkstra graph G, the string C(G) that will be coding 
G is constructed over an alphabet of symbols containing integers in the range 
{1,..., At(G) + 4}, where A*(G) is the maximum cardinality among the out- 
neighborhoods of G. Let, A, B be a pair of strings. The concatenation of A and 
B, denoted A||B, is the string formed by A, immediately followed by B. 
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In order to define the code C(G) for a Dijkstra graph G, we assign an integer, 
named type(H), for each statement graph H, a code C(v) for each vertex v € 
V(G), and a code C(#) for each prime subgraph H of a bottom-up contractile 
sequence of G. The code C(G) of the graph G is defined as being that of the 
source of G. For a subset V’ C V(G), the code C(V’) of V’ is the set of strings 
C(V’) = {C(wi)|ui € V'}. Write lex(C(V’)) = C(v1)|I...||C(wr) whenever V’ = 
{v1,..., vr} and C(v;) is lexicographically not greater than C(v;41). 


Table 1: Statement 28 aoe and a C(H) of prime subgraphs H 


statement|type(H ,u = 3( 
graphs a 
| trivial | 


Pane 





repeat [5 IO (WOME UD 
fetes] 0 HesteO¥ HTC x 
pe Aes CUM 





Next, we describe how to obtain the actual codes. The types of the the 
different statement graphs are shown in the second column of Table 1. For a 
vertex v € V(G), the code C(v) is initially set to 1. Subsequently, if v becomes 
the source of a prime graph H, the string C(v) is updated by implicitly assigning 
C(v) := C(v)||C(A), where C(#) is given by the third column of the table. Such 
an operation is called the expansion of v. It follows that CH) is written in terms 
of type(H) and the codes of the vertices of H, and so on iteratively. A possible 
expansion of some other vertex w € V(G) could imply in an expansion of v, 
and so iteratively. Observe that when H is an if-then-else or a p-case graph, 
we have chosen to place the codes of the out-neighbors of s(H) in lexicographic 
ordering. For the remaining statement graphs H, the ordering of the codes of 
the out-neighbors of s(H) is also unique and implicitly imposed by H. When all 
primes associated to C(v) have been expanded, C'(v) has reached its final value, 


5.1 The Isomorphism Algorithm 


Next, we describe the actual formulation of the algortithm. 
Let G be a DG. Algorithm 2 constructs the encoding C(G) for G. 
An example is given in Figure 8. 


5.2. Correctness and Complexity 


Theorem 10 Let G,G’ de Dijkstra graphs, and C(G),C(G’) their codes, re- 
spectively. Then GG’ are isomorphic if and only if C(G) = C(G’). 
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Algorithm 2: Dijkstra graphs isomorphism algorithm 


G, DG; Ec, set of cycle edges of G 
Find a topological sorting v1,...,Un of G—-— Ec 
for i=n,n-—1,...,1 do 
C(vi) = 1 
if v; is the source of a prime subgraph H then 
2||C(N* (v:)), if H is a sequence graph; 
3IIC(N* (ui) \ NP (wi) ICN? (x), 
if H is an if-then graph; 
AIC(N (vi) AN (wi) )IIC(N* (vs) \ N~ (v4); 
if H is a while graph, 
BIIC(N* (vi) ||C(N** (vi) \ {vi}), 
if H is a repeat graph; 
6lew(C(N* (vs) CIN? (wi)), 
if H is an if-then-else graph. 
p+Alllex(C(N* (vi)))||C(N**(v)), 
if H is a p-case graph. 








Proof. By hypothesis, G,G’ are isomorphic. We show that it implies C(G) = 
C(G’). Following the isomorphism algorithm, observe that the number of 1’s in 
the strings C(G), C(G’) represents the number of vertices of G’, G’, respectively, 
whereas each integer > 1 in the strings, represents the contraction of a prime 
subgraph. Furthermore, each prime subgraph H, which is initially contained in 
the input graph G, corresponds in C(G), to a substring formed by the integer 
type(H) followed by one 1, if type(H) = 2; or two 1’s, if type(H) = 3; or three 
1’s, if 4 < type(H) < 6; or type(H) +1 1’s, if type(H) > 6; respectively. Clearly, 
the same holds for the graph G’ and its code C(G’). The proof is by induction 
on the number & of contractions needed to reduce both G and G’ to a trivial 
vertex. By Theorem 6, k& is invariant and applies for both graphs G and G’. 
If k = 0 then both G and G’ are trivial graphs, and the theorem holds, since 
C(G) = C(G’) = 1. When k > 0, assume that if G_ and G’_ are isomorphic 
DG graphs which require less than k contractions for reduction then C(G_) = 
C(G“). Furthermore, assume also by the induction hypothesis, that if v,v’ are 
vertices of G_,G'_, corresponding to 1’s at the same relative positions in C(G) 
and C(G_), respectively, then v’ = f(v), where f is the isomorphism function 
between G_ and G’_. Now, consider the graphs G and G’. Choose a prime 
subgraph H of G, and let v = s(#H). Let v’ = f(v) bea vertex of G’ corresponding 
to v by the isomorphism. Since G & G’, it follows that v’ is the source of a prime 
subgraph H’ of G’. Moreover H & H’. Consider the contractions G | H and 
G’ | H’, leading to graphs G_ and G“, respectively. Let C_(G) and C_(G’) 
be the strings obtained from C(G) and C(G’), respectively by contracting the 
substrings corresponding to H and H’, as above. That is, all the 1’s of C(H) 
and C(H’) are compressed into the positions of v = s(H) and v’ = s(H’), 
respectively, while the integers type(H) and type(H’) become 1, maitaining their 
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C(vo) = 16||lex(C(V1), C12) !CO13) = 
16111 








C(v2) = 14||CW4)||C(vs) = 
1412161111 


® 


C(G) = C(y\) = 
Cy) = 16||lex(C(v2),C(v3) )1|C10) = 161312111412161111121 
161312111412161111121 


Fig. 8: Example for isomorphism algorithm 


original positions. It follows that C(G_) = C_(G) and C(G’_) = C_(G’). By the 
induction hypothesis C(G_) = C(G‘_) and the 1’s corresponding to v and v’ lie 
in the same relative positions in the strings. Consequently, by replacing the latter 
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1’s for the substrings which originally represented H and H’, we conclude that 
indeed C(G) = C(G’), and moreover the induction hypothesis is still verified. 
The converse is similar. 


The corollaries below are direct consequences of Theorem 10. 


Corollary 11 Let G be a DG. The following affirmatives hold. 


1. There is a one-to-one correspondence between the 1’s of C(G) and the ver- 
tices of G. 
2. The code C(G) of G is unique and is a representation of G. 


Corollary 12 Let G,G’ be DGs and C(G),C(G’) their corresponding codes, 
satisfying C(G) = C(G"’). Then an isomorphism function f between G and G’ 
can be determined as follows. Let v € V(G) and v' € V(G") correspond to 1’s at 
identical relative positions in C(G) and C(G’), respectively. Define f(v) := v'. 


Finally, consider the complexity of the isomorphism algorithm. 


Lemma 13 Let G be a Dijkstra graph, and C(G) its code. Then |C(G)| = 
n+k < 2n—1, where n is the number of vertices of G and k the number 
of contractions needed to reduce it to a trivial vertex. 


Proof. The encoding C(G) consists of exactly n 1’s, together with elements of a 
multiset U C {2,3,..., At(G)+4}. We know that C(G) starts and ends with an 
1, and it contains no two consecutive elements of U. Therefore C(G) < 2n— 1. 
When G consists of the induced path P,,, it follows |C(P,,)| = 2n — 1, attaining 
the bound. 


Theorem 14 The isomorphism algorithm terminates within O(n) time. 


Proof. Recall that m = O(n), by Lemma 7. The construction of a bottom- 
up contractile sequence requires O(n) steps. For each v € V(G), following the 
isomorphism algorithm, C(v) can be constructed in time |C(v)|. We remark 
that lexicographic ordering takes linear time on the total length of the strings 
to be sorted. It follows that the algorithm requires no more than O(n) time to 
construct the code C(G) of G. 


6 Conclusions 


The analysis of control flow graphs and different forms of structuring have been 
considered in various papers. To our knowledge, no full characterization and no 
recognition algorithm for control flow graphs of structured programs have been 
described before. There are some related classes for which characterizations and 
efficient recognition algorithms do exist, e.g. the classes of reducible graphs and 
D-charts. However, both contain and are much larger than Dijkstra graphs. 

An important question solved in this paper is that of recognizing whether 
two control flow graphs (of structured programs) are syntactically equivalent, 
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i.e., isomorphic. Such question fits in the area of code similarity analysis, with 
applications in clone detection, plagiarism and software forensics. 

Since the establishment of structured programming, some new statements 
have been proposed to add to the original structures which forms the classical 
structured programming, enlarging the collection of allowed statements. Some 
of such relevant statements are depicted in Figures 9. 


(a) break-while: Allows an early exit from a while statement; 

(b) continue-while: Allows a while statement to proceed, after its original ter- 
mination; 

(c) break-repeat: Allows an early exit from a repeat statement; 

(d) continue-repeat: Allows a repeat statement to proceed, after its original ter- 
mination; 

(e) divergent-if-then-else: A selection statement, similar to the standard if-then- 
else, except that the comparisons do not converge afterwords to a same point, 
but lead to disjoint structures. Note that the corresponding graph has no 
longer a (unique) sink. 








(d) (e) 


Fig. 9: Generalized Dijkstra graphs 


In fact, the inclusion of some of the above additional control blocks in struc- 
tured programming has been already predicted in some papers, as [17]. The basic 
ideas and techniques described in the present work can be generalized, so as to 
efficiently recognize graphs that incorporate the above statements, in addition 
to those of Dijkstra graphs. Similarly, for the isomorphism algorithm. 
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